# MoTrPAC R packages 
library(MotrpacRatTraining6mo) # v1.5.0
# MotrpacRatTraining6moData v1.8.0 is automatically attached

# for plotting
library(ggplot2)

knitr::opts_chunk$set(echo = TRUE)

Background

About these R package

These package provides functions to fetch, explore, and reproduce the processed data and downstream analysis results presented in the main paper for the first large-scale multi-omic multi-tissue endurance exercise training study conducted in young adult rats by the Molecular Transducers of Physical Activity Consortium (MoTrPAC). Find the preprint on bioRxiv.

While functions in MotrpacRatTraining6moData can be used by themselves, they were primarily written to analyze data in the MotrpacRatTraining6moData R package.

Note: Data objects from the MotrpacRatTraining6moData package are indicated as variable names in all caps, e.g. PHENO, TRNSCRPT_LIVER_RAW_COUNTS.

Tip: To learn more about any data object or function, use ? to retrieve the documentation, e.g., ?METAB_FEATURE_ID_MAP, ?load_sample_data. Note that both MotrpacRatTraining6mo and MotrpacRatTraining6moData must be installed and loaded with library() for this to work.

About MoTrPAC

MoTrPAC is a national research consortium designed to discover and perform preliminary characterization of the range of molecular transducers (the “molecular map”) that underlie the effects of physical activity in humans. The program’s goal is to study the molecular changes that occur during and after exercise and ultimately to advance the understanding of how physical activity improves and preserves health. The six-year program is the largest targeted NIH investment of funds into the mechanisms of how physical activity improves health and prevents disease. See motrpac.org and motrpac-data.org for more details.

Study design

Details of the experimental design can be found in the supplementary methods of the bioRxiv preprint. Briefly, 6-month-old young adult rats were subjected to progressive endurance exercise training for 1, 2, 4, or 8 weeks, with tissues collected 48 hours after the last training bout. Sex-matched sedentary, untrained rats were used as controls. Whole blood, plasma, and 18 solid tissues were analyzed using genomics, proteomics, metabolomics, and protein immunoassay technologies, with most assays performed in a subset of these tissues. Depending on the assay, between 3 and 6 replicates per sex per time point were analyzed.

Tissue and assay abbreviations

It is important to be aware of the tissue and assay abbreviations because they are used to name data objects and define arguments for many functions.

Tissues

  • ADRNL: adrenal gland
  • BAT: brown adipose tissue
  • BLOOD: whole blood
  • COLON: colon
  • CORTEX: cerebral cortex
  • HEART: heart
  • HIPPOC: hippocampus
  • HYPOTH: hypothalamus
  • KIDNEY: kidney
  • LIVER: liver
  • LUNG: lung
  • OVARY: ovaries (female gonads)
  • PLASMA: plasma from blood
  • SKM-GN: gastrocnemius (leg skeletal muscle)
  • SKM-VL: vastus lateralis (leg skeletal muscle)
  • SMLINT: small intestine
  • SPLEEN: spleen
  • TESTES: testes (male gonads)
  • VENACV: vena cava
  • WAT-SC: subcutaneous white adipose tissue

Assays/omes

  • ACETYL: acetylproteomics; protein site acetylation
  • ATAC: chromatin accessibility, ATAC-seq data
  • IMMUNO: multiplexed immunoassays (cytokines and hormones)
  • METAB: metabolomics and lipidomics
  • METHYL: DNA methylation, RRBS data
  • PHOSPHO: phosphoproteomics; protein site phosphorylation
  • PROT: global proteomics; protein abundance
  • TRNSCRPT: transcriptomics, RNA-Seq data
  • UBIQ: ubiquitylome; protein site ubiquitination

The vectors of abbreviations are also available in TISSUE_ABBREV and ASSAY_ABBREV:

TISSUE_ABBREV
##  [1] "ADRNL"  "BAT"    "BLOOD"  "COLON"  "CORTEX" "HEART"  "HIPPOC" "HYPOTH"
##  [9] "KIDNEY" "LIVER"  "LUNG"   "OVARY"  "PLASMA" "SKM-GN" "SKM-VL" "SMLINT"
## [17] "SPLEEN" "TESTES" "VENACV" "WAT-SC"
ASSAY_ABBREV
## [1] "ACETYL"   "ATAC"     "IMMUNO"   "METAB"    "METHYL"   "PHOSPHO"  "PROT"    
## [8] "TRNSCRPT" "UBIQ"

Guiding question

Let’s see how we can use these R packages to identify an interesting shared multi-omic signal in the heart (HEART) and gastrocnemius (SKM-GN) muscle tissues.

Sample-level data

First, let’s look at a set of functions used to easily load data from the MotrpacRatTraining6moData R package.

Use load_sample_data() to load sample-level data for a specific ome and tissue.

?load_sample_data

# heart protein expression 
heart_prot = load_sample_data("HEART", "PROT")
## PROT_HEART_NORM_DATA
# heart immunoassay data
heart_immuno = load_sample_data("HEART", "IMMUNO")
## IMMUNO HEART normalized data from IMMUNO_NORM_DATA_FLAT
# gastroc gene expression
gastroc_rna = load_sample_data("SKM-GN", "TRNSCRPT")
## TRNSCRPT_SKMGN_NORM_DATA
# gastroc metabolomics
gastroc_metab = load_sample_data("SKM-GN", "METAB")
## METAB SKM-GN normalized data from METAB_NORM_DATA_FLAT

Let’s look at the format of these data frames.

heart_prot[1:8,1:5]
##                     feature     feature_ID tissue assay 90237015805
## 1                      <NA> XP_017456475.1  HEART  PROT     0.00769
## 2                      <NA> XP_017447817.1  HEART  PROT     0.09326
## 3                      <NA>    NP_446341.1  HEART  PROT    -0.11469
## 4                      <NA>    NP_071796.2  HEART  PROT     0.05903
## 5                      <NA> NP_001157776.1  HEART  PROT     1.24616
## 6 PROT;HEART;XP_003754359.1 XP_003754359.1  HEART  PROT    -0.04140
## 7                      <NA> XP_017459445.1  HEART  PROT    -0.03179
## 8                      <NA> XP_017453129.1  HEART  PROT    -0.12237

The feature column is non-NA only for training-regulated features, which provides a convenient way of identifying this subset of features.

Sometimes it is more convenient to have all-numeric data frames. We can easily convert these data frames to this format with df_to_numeric().

df_to_numeric(heart_prot, rownames = "feature_ID")[1:5,1:5]
##                90237015805 90245015805 90441015805 90420015805 90289015805
## XP_017456475.1     0.00769    -0.05952     0.10992    -0.00229     0.04674
## XP_017447817.1     0.09326    -0.08879     0.31763     0.07739    -0.06031
## NP_446341.1       -0.11469     0.10623     0.08138     0.08464    -0.16039
## NP_071796.2        0.05903    -0.06490     0.05673    -0.07494     0.06263
## NP_001157776.1     1.24616     0.63626     0.15487    -0.39231    -0.64953

Note that load_sample_data() also prints a message with the name of the object in MotrpacRatTraining6moData that is being returned. Therefore, instead of using load_sample_data(), we could call the object directly.

PROT_HEART_NORM_DATA[1:8,1:5]
##                     feature     feature_ID tissue assay 90237015805
## 1                      <NA> XP_017456475.1  HEART  PROT     0.00769
## 2                      <NA> XP_017447817.1  HEART  PROT     0.09326
## 3                      <NA>    NP_446341.1  HEART  PROT    -0.11469
## 4                      <NA>    NP_071796.2  HEART  PROT     0.05903
## 5                      <NA> NP_001157776.1  HEART  PROT     1.24616
## 6 PROT;HEART;XP_003754359.1 XP_003754359.1  HEART  PROT    -0.04140
## 7                      <NA> XP_017459445.1  HEART  PROT    -0.03179
## 8                      <NA> XP_017453129.1  HEART  PROT    -0.12237

Let’s plot the sample-level data for a single differential feature.

head(heart_prot$feature[!is.na(heart_prot$feature)])
## [1] "PROT;HEART;XP_003754359.1" "PROT;HEART;NP_058935.2"   
## [3] "PROT;HEART;XP_006252013.1" "PROT;HEART;XP_017452487.1"
## [5] "PROT;HEART;XP_001058477.1" "PROT;HEART;XP_017444239.1"
plot_feature_normalized_data(feature="PROT;HEART;XP_003754359.1", 
                             add_gene_symbol = TRUE)
plot_feature_normalized_data(feature="PROT;HEART;XP_003754359.1", 
                             add_gene_symbol = TRUE,
                             facet_by_sex = TRUE)

We can also use combine_normalized_data() to load normalized data from multiple omes or tissues simultaneously. For example, let’s load all non-epigenetic data from the gastroc.

all_gastroc = combine_normalized_data(tissues="SKM-GN", include_epigen = FALSE)
## Warning in combine_normalized_data(tissues = "SKM-GN", include_epigen = FALSE):
## 'include_epigen' is FALSE. Excluding ATAC and METHYL data.
## IMMUNO SKM-GN normalized data from IMMUNO_NORM_DATA_FLAT
## METAB SKM-GN normalized data from METAB_NORM_DATA_FLAT
## PHOSPHO_SKMGN_NORM_DATA
## PROT_SKMGN_NORM_DATA
## TRNSCRPT_SKMGN_NORM_DATA
all_gastroc[1:5,1:8]
##              feature     feature_ID tissue  assay       dataset  10023259
## 1               <NA>    ADIPONECTIN SKM-GN IMMUNO   ADIPONECTIN 15.129967
## 2               <NA> PAI-1/SERPIN-E SKM-GN IMMUNO      SERPIN-E  5.727920
## 3 IMMUNO;SKM-GN;GCSF           GCSF SKM-GN IMMUNO rat-mag27plex  4.087463
## 4               <NA>        EOTAXIN SKM-GN IMMUNO rat-mag27plex  5.475733
## 5               <NA>          GMCSF SKM-GN IMMUNO rat-mag27plex  4.426265
##    10024735  10025626
## 1 15.150025 15.124909
## 2  5.491853  5.321928
## 3  4.044394  4.000000
## 4  5.087463  5.375039
## 5  4.247928  4.807355
# how many features?
nrow(all_gastroc)
## [1] 50712
# how many differential features?
nrow(all_gastroc[!is.na(all_gastroc$feature),])
## [1] 2296
# what omes?
table(all_gastroc$assay[!is.na(all_gastroc$feature)])
## 
##   IMMUNO    METAB  PHOSPHO     PROT TRNSCRPT 
##        7      338      694      691      566

Note that column names are now PIDs instead of vial labels. This is because measurements from multiple vial labels (samples) from the same PID (animal) are included in the same column.

Differential analysis results

Next, let’s load the differential analysis results. Recall there are two types of differential analysis results: timewise and training. We use the adjusted p-value from the training differential analysis to identify training-regulated features (selection_fdr); we use the summary statistics from the timewise differential analysis to quantify time- and sex- specific training effects. The results below are the timewise results and adjusted p-values from the training results. Lesser-used additional training summary statistics are available through load_training_da().

First, let’s consider training-regulated features only. These are conveniently provided in TRAINING_REGULATED_FEATURES, which we can easily subset to any data set of interest.

head(TRAINING_REGULATED_FEATURES)
##                             feature  assay assay_code tissue tissue_code
## 1 ACETYL;HEART;NP_001003673.1_K477k ACETYL    prot-ac  HEART   t58-heart
## 2 ACETYL;HEART;NP_001003673.1_K477k ACETYL    prot-ac  HEART   t58-heart
## 3 ACETYL;HEART;NP_001003673.1_K477k ACETYL    prot-ac  HEART   t58-heart
## 4 ACETYL;HEART;NP_001003673.1_K477k ACETYL    prot-ac  HEART   t58-heart
## 5 ACETYL;HEART;NP_001003673.1_K477k ACETYL    prot-ac  HEART   t58-heart
## 6 ACETYL;HEART;NP_001003673.1_K477k ACETYL    prot-ac  HEART   t58-heart
##             feature_ID non_redundant_feature_ID platform    sex training_group
## 1 NP_001003673.1_K477k                     <NA>     <NA> female             1w
## 2 NP_001003673.1_K477k                     <NA>     <NA> female             2w
## 3 NP_001003673.1_K477k                     <NA>     <NA> female             4w
## 4 NP_001003673.1_K477k                     <NA>     <NA> female             8w
## 5 NP_001003673.1_K477k                     <NA>     <NA>   male             1w
## 6 NP_001003673.1_K477k                     <NA>     <NA>   male             2w
##   timewise_logFC timewise_logFC_se timewise_p_value timewise_zscore
## 1    -0.02737365        0.07346830       0.71228171        -0.37259
## 2    -0.01935973        0.07346830       0.79410231        -0.26351
## 3    -0.05102628        0.07346830       0.49311823        -0.69453
## 4    -0.16677605        0.07346830       0.03116679        -2.27004
## 5     0.03933023        0.08168714       0.63486235         0.48147
## 6     0.10815561        0.06773140       0.12435304         1.59683
##   meta_reg_het_p meta_reg_pvalue training_p_value training_q
## 1             NA              NA      0.002814029  0.0275727
## 2             NA              NA      0.002814029  0.0275727
## 3             NA              NA      0.002814029  0.0275727
## 4             NA              NA      0.002814029  0.0275727
## 5             NA              NA      0.002814029  0.0275727
## 6             NA              NA      0.002814029  0.0275727
heart_gastroc_sig = TRAINING_REGULATED_FEATURES[TRAINING_REGULATED_FEATURES$tissue %in% c("HEART","SKM-GN"),]
# how many differential features?
nrow(heart_gastroc_sig)
## [1] 47176
# what tissues and omes?
table(heart_gastroc_sig$tissue, heart_gastroc_sig$assay)
##         
##          ACETYL ATAC IMMUNO METAB METHYL PHOSPHO PROT TRNSCRPT UBIQ
##   HEART    3600  600     40  4544    856    3488 5544     5776  192
##   SKM-GN      0 3536     56  2384    952    5552 5528     4528    0

We can also load the full set of differential analysis results. This consumes some memory, so let’s skip this for now.

# for a single tissue
heart_da = combine_da_results(tissues="HEART")
# for multiple tissues
heart_gastroc_da = combine_da_results(tissues=c("SKM-GN","HEART"))

Plot the differential analysis results for a single training-regulated feature.

# plot the first differential feature
head(heart_gastroc_sig$feature)
## [1] "ACETYL;HEART;NP_001003673.1_K477k" "ACETYL;HEART;NP_001003673.1_K477k"
## [3] "ACETYL;HEART;NP_001003673.1_K477k" "ACETYL;HEART;NP_001003673.1_K477k"
## [5] "ACETYL;HEART;NP_001003673.1_K477k" "ACETYL;HEART;NP_001003673.1_K477k"
plot_feature_logfc(feature = "ACETYL;HEART;NP_001003673.1_K477k", facet_by_sex = TRUE, add_gene_symbol = TRUE)
# how does this compare to the sample-level data?
plot_feature_normalized_data(feature = "ACETYL;HEART;NP_001003673.1_K477k", facet_by_sex = TRUE, add_gene_symbol = TRUE)

Graphical clustering results

Across all tissues and omes, we had over 30,000 training-regulated features. We used a graphical approach to identify groups of features with similar temporal dynamics. See more details in the “Get Started” tutorial.

The graph representation of the repfdr results are provided in GRAPH_COMPONENTS (node and edge lists) and GRAPH_STATES (data frame representation). Features are specified in the format “[ASSAY_ABBREV];[TISSUE_ABBREV];[feature_ID]”, e.g. “PHOSPHO;SKM-GN;NP_001006973.1_S34s”, “METAB;SKM-GN;1-methylnicotinamide”.

Plot feature trajectories

Let’s start exploring the heart and gastroc graphical clustering results features by plotting the graphs themselves.

Start by plotting graphs for gastroc and heart separately.

# gastroc
# get_tree_plot_for_tissue(tissues=c("SKM-GN"))
get_tree_plot_for_tissue(tissues=c("SKM-GN"),
                         omes=c("PROT","PHOSPHO","TRNSCRPT"),
                         max_trajectories = 3,
                         parallel_edges_by_ome = TRUE,
                         curvature = 0.5,
                         edge_alpha_range = c(0.5, 1))
## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning

# heart
# get_tree_plot_for_tissue(tissues=c("HEART"),
#                          max_trajectories = 3,
#                          parallel_edges_by_ome = TRUE,
#                          curvature = 1,
#                          edge_alpha_range = c(0.5, 1))
get_tree_plot_for_tissue(tissues=c("HEART"),
                         omes=c("PROT","PHOSPHO","TRNSCRPT","METAB","ACETYL"),
                         max_trajectories = 3,
                         parallel_edges_by_ome = TRUE,
                         curvature = 0.8,
                         edge_alpha_range = c(0.5, 1))
## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning

Now plot both heart and gastroc features together. For the sake of visualization, let’s limit it to popular omes in both tissues.

# edges by tissue
get_tree_plot_for_tissue(tissues=c("SKM-GN","HEART"),
                         omes=c("PROT","PHOSPHO","TRNSCRPT","METAB"),
                         max_trajectories = 3,
                         parallel_edges_by_tissue = TRUE,
                         curvature = 0.5,
                         edge_alpha_range = c(0.5, 1))
## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning

# edges by ome
get_tree_plot_for_tissue(tissues=c("SKM-GN","HEART"),
                         omes=c("PROT","PHOSPHO","TRNSCRPT","METAB"),
                         max_trajectories = 3,
                         parallel_edges_by_ome = TRUE,
                         curvature = 0.8,
                         edge_alpha_range = c(0.5, 1))
## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning

There’s a lot going on in each of these three trajectories. We will focus on these three paths in the heart and gastroc for the rest of this workshop.

Get underlying features

Let’s determine the features that belong to each of these paths. This can be done one of a few ways, but I’ll show two here.

First, how can we get the names of these paths without actually having to type it all out? Since we’re looking at the three largest paths, we can use extract_tissue_sets().

sets = extract_tissue_sets(tissues=c("HEART","SKM-GN"), k = 3, add_week8 = FALSE)
# select paths only
names(sets)
## [1] "8w_F1_M1"                                      
## [2] "4w_F1_M1"                                      
## [3] "8w_F-1_M-1"                                    
## [4] "4w_F1_M1---8w_F1_M1"                           
## [5] "1w_F0_M0---2w_F0_M0"                           
## [6] "2w_F1_M1---4w_F1_M1"                           
## [7] "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"        
## [8] "1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1"
## [9] "1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"
sets = sets[grepl("->",names(sets))]
names(sets)
## [1] "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"        
## [2] "1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1"
## [3] "1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"
lapply(sets, head)
## $`1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`
## [1] "ACETYL;HEART;NP_001004220.1_K116k" "ACETYL;HEART;NP_001006973.1_K401k"
## [3] "ACETYL;HEART;NP_001007621.1_K354k" "ACETYL;HEART;NP_001014183.1_K193k"
## [5] "ACETYL;HEART;NP_001014183.1_K199k" "ACETYL;HEART;NP_001030123.1_K581k"
## 
## $`1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1`
## [1] "ACETYL;HEART;NP_001020795.1_K31k"  "ACETYL;HEART;NP_001162612.1_K438k"
## [3] "ACETYL;HEART;NP_001188284.1_K17k"  "ACETYL;HEART;NP_001316822.1_K4k"  
## [5] "ACETYL;HEART;NP_037320.1_K811k"    "ACETYL;HEART;NP_058771.2_K240k"   
## 
## $`1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1`
## [1] "ACETYL;HEART;NP_001005550.1_K709k" "ACETYL;HEART;NP_001094198.1_K204k"
## [3] "ACETYL;HEART;NP_037023.1_K276k"    "ACETYL;HEART;NP_059002.2_K247k"   
## [5] "ACETYL;HEART;NP_071579.1_K78k"     "ACETYL;HEART;NP_071579.1_K94k"

Note that this function does not separate paths by tissue when constructing the lists of features.

If we weren’t looking at the top N paths, we could construct the name of the each path and subset GRAPH_STATES. In addition to subsetting the path column as shown here, we can use the state_*w columns to select features in any node, edge, or path.

head(GRAPH_STATES)
##                                                             feature    ome
## ACETYL;HEART;NP_001003673.1_K477k ACETYL;HEART;NP_001003673.1_K477k ACETYL
## ACETYL;HEART;NP_001004072.2_K77k   ACETYL;HEART;NP_001004072.2_K77k ACETYL
## ACETYL;HEART;NP_001004072.2_K83k   ACETYL;HEART;NP_001004072.2_K83k ACETYL
## ACETYL;HEART;NP_001004085.1_K186k ACETYL;HEART;NP_001004085.1_K186k ACETYL
## ACETYL;HEART;NP_001004085.1_K261k ACETYL;HEART;NP_001004085.1_K261k ACETYL
## ACETYL;HEART;NP_001004085.1_K268k ACETYL;HEART;NP_001004085.1_K268k ACETYL
##                                   tissue           feature_ID state_1w state_2w
## ACETYL;HEART;NP_001003673.1_K477k  HEART NP_001003673.1_K477k    F0_M0    F0_M0
## ACETYL;HEART;NP_001004072.2_K77k   HEART  NP_001004072.2_K77k    F0_M0    F0_M0
## ACETYL;HEART;NP_001004072.2_K83k   HEART  NP_001004072.2_K83k  F-1_M-1  F-1_M-1
## ACETYL;HEART;NP_001004085.1_K186k  HEART NP_001004085.1_K186k    F0_M1    F0_M1
## ACETYL;HEART;NP_001004085.1_K261k  HEART NP_001004085.1_K261k    F0_M1    F0_M1
## ACETYL;HEART;NP_001004085.1_K268k  HEART NP_001004085.1_K268k    F0_M1    F0_M1
##                                   state_4w state_8w
## ACETYL;HEART;NP_001003673.1_K477k    F0_M0  F-1_M-1
## ACETYL;HEART;NP_001004072.2_K77k     F1_M0    F1_M0
## ACETYL;HEART;NP_001004072.2_K83k    F0_M-1    F0_M0
## ACETYL;HEART;NP_001004085.1_K186k    F0_M0   F0_M-1
## ACETYL;HEART;NP_001004085.1_K261k    F0_M0   F0_M-1
## ACETYL;HEART;NP_001004085.1_K268k    F0_M0   F0_M-1
##                                                                          path
## ACETYL;HEART;NP_001003673.1_K477k    1w_F0_M0->2w_F0_M0->4w_F0_M0->8w_F-1_M-1
## ACETYL;HEART;NP_001004072.2_K77k       1w_F0_M0->2w_F0_M0->4w_F1_M0->8w_F1_M0
## ACETYL;HEART;NP_001004072.2_K83k  1w_F-1_M-1->2w_F-1_M-1->4w_F0_M-1->8w_F0_M0
## ACETYL;HEART;NP_001004085.1_K186k     1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K261k     1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K268k     1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
##                                                                         tissue_path
## ACETYL;HEART;NP_001003673.1_K477k    HEART:1w_F0_M0->2w_F0_M0->4w_F0_M0->8w_F-1_M-1
## ACETYL;HEART;NP_001004072.2_K77k       HEART:1w_F0_M0->2w_F0_M0->4w_F1_M0->8w_F1_M0
## ACETYL;HEART;NP_001004072.2_K83k  HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F0_M-1->8w_F0_M0
## ACETYL;HEART;NP_001004085.1_K186k     HEART:1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K261k     HEART:1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K268k     HEART:1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
clusters = list()
#paths = names(sets)
paths = c("1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1", "1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1", "1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1")
tissues = c("SKM-GN", "HEART")
for (path in paths){
  for(t in tissues){
    name = paste0(t, ":", path)
    clusters[[name]] = as.vector(na.omit(GRAPH_STATES$feature[GRAPH_STATES$path == path & 
                                                                GRAPH_STATES$tissue == t]))
  } 
}
lapply(clusters, length)
## $`SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`
## [1] 249
## 
## $`HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`
## [1] 357
## 
## $`SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1`
## [1] 229
## 
## $`HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1`
## [1] 177
## 
## $`SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1`
## [1] 107
## 
## $`HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1`
## [1] 138

Let’s move forward with this list of features per cluster. We’ll rename them to make them easier to keep track of.

MY_CLUSTERS = clusters

We can also use the check_cluster_res_format() function to transform the list to an annotated data frame.

MY_CLUSTER_DF = check_cluster_res_format(clusters)
head(MY_CLUSTER_DF)
##                                 feature
## 1 ATAC;SKM-GN;chr15:101579022-101579761
## 2  ATAC;SKM-GN;chr1:221194280-221195125
## 3  ATAC;SKM-GN;chr1:221199295-221200124
## 4   ATAC;SKM-GN;chr20:34579127-34579734
## 5  ATAC;SKM-GN;chr5:114013062-114014681
## 6  ATAC;SKM-GN;chrX:119151643-119154560
##                                         cluster  ome tissue
## 1 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 2 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 3 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 4 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 5 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 6 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
##                  feature_ID
## 1 chr15:101579022-101579761
## 2  chr1:221194280-221195125
## 3  chr1:221199295-221200124
## 4   chr20:34579127-34579734
## 5  chr5:114013062-114014681
## 6  chrX:119151643-119154560

Plot groups of features

We can visualize the composition of these clusters in terms of tissues and omes.

plot_features_per_cluster(MY_CLUSTERS) # could also use MY_CLUSTER_DF as input

We can also look at the trajectories of these features’ abundances over the training time course. Let’s look at a single cluster.

plot_feature_trajectories(MY_CLUSTERS$`SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`, 
                          title="SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1")

Pathway enrichment

One important tool we use to interpret the biology underlying graphical clusters is pathway enrichment analysis. In this section, we look at functions to explore the existing pathway enrichment results presented in the manuscript.

To perform your own pathway enrichment analysis, either with MoTrPAC data from this study or your own data, see cluster_pathway_enrichment(), gene_pathway_enrichment(), and custom_cluster_pathway_enrichment().

Explore existing results

For the manuscript, we performed pathway enrichment for the 2 largest nodes, 2 largest edges, 10 largest non-null paths, and all 8-week nodes from the graphical representation of training-regulated features in each tissue. This set of graphical clusters can be extracted using extract_main_clusters(). Pathway enrichment results were adjusted over all tests using IHW with tissue as a covariate. These pathway enrichment results are available in GRAPH_PW_ENRICH.

Because our 3 paths of interest are among the 10 largest paths in both tissues, the pathway enrichment results for these features are already available in GRAPH_PW_ENRICH. Let’s take a look.

head(GRAPH_PW_ENRICH)
##     query significant term_size query_size intersection_size precision
## 1 query_1        TRUE        63         65                18 0.2769231
## 2 query_1        TRUE        23         65                12 0.1846154
## 3 query_1        TRUE        24         65                 9 0.1384615
## 4 query_1        TRUE        25         65                 9 0.1384615
## 5 query_1        TRUE        34         65                10 0.1538462
## 6 query_1        TRUE        23         65                 8 0.1230769
##      recall    term_id source                                  term_name
## 1 0.2857143 KEGG:01200   KEGG                          Carbon metabolism
## 2 0.5217391 KEGG:00020   KEGG                  Citrate cycle (TCA cycle)
## 3 0.3750000 KEGG:00640   KEGG                      Propanoate metabolism
## 4 0.3600000 KEGG:01212   KEGG                      Fatty acid metabolism
## 5 0.2941176 KEGG:00280   KEGG Valine, leucine and isoleucine degradation
## 6 0.3478261 KEGG:00071   KEGG                     Fatty acid degradation
##   effective_domain_size source_order    parents
## 1                  1394          184 KEGG:00000
## 2                  1394            3 KEGG:00000
## 3                  1394          111 KEGG:00000
## 4                  1394          186 KEGG:00000
## 5                  1394           32 KEGG:00000
## 6                  1394           11 KEGG:00000
##                                                                              evidence_codes
## 1 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 2                               KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 3                                              KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 4                                              KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 5                                         KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 6                                                   KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
##                                                                                                                                                                                                                                                                                                                                            intersection
## 1 ENSRNOG00000003653,ENSRNOG00000015020,ENSRNOG00000008103,ENSRNOG00000011329,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000024128,ENSRNOG00000009994,ENSRNOG00000010277,ENSRNOG00000018522,ENSRNOG00000023520,ENSRNOG00000007895,ENSRNOG00000006364,ENSRNOG00000028557,ENSRNOG00000017481,ENSRNOG00000013949,ENSRNOG00000005686,ENSRNOG00000050843
## 2                                                                                                                   ENSRNOG00000003653,ENSRNOG00000015020,ENSRNOG00000008103,ENSRNOG00000024128,ENSRNOG00000009994,ENSRNOG00000010277,ENSRNOG00000023520,ENSRNOG00000007895,ENSRNOG00000006364,ENSRNOG00000017481,ENSRNOG00000013949,ENSRNOG00000005686
## 3                                                                                                                                                                            ENSRNOG00000015029,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000018522,ENSRNOG00000006364,ENSRNOG00000028557,ENSRNOG00000017481,ENSRNOG00000005686,ENSRNOG00000050843
## 4                                                                                                                                                                            ENSRNOG00000018114,ENSRNOG00000011413,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000015840,ENSRNOG00000010697,ENSRNOG00000018522,ENSRNOG00000013766,ENSRNOG00000010800
## 5                                                                                                                                                         ENSRNOG00000015029,ENSRNOG00000008063,ENSRNOG00000001177,ENSRNOG00000010697,ENSRNOG00000018522,ENSRNOG00000013766,ENSRNOG00000010800,ENSRNOG00000006364,ENSRNOG00000028557,ENSRNOG00000050843
## 6                                                                                                                                                                                               ENSRNOG00000018114,ENSRNOG00000008843,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000010697,ENSRNOG00000018522,ENSRNOG00000013766,ENSRNOG00000010800
##   gost_adj_p_value computed_p_value        cluster tissue    ome kegg_id
## 1     2.237958e-09     5.204554e-11 HEART:8w_F1_M1  HEART ACETYL    <NA>
## 2     2.237958e-09     3.414274e-11 HEART:8w_F1_M1  HEART ACETYL    <NA>
## 3     9.644681e-06     4.485898e-07 HEART:8w_F1_M1  HEART ACETYL    <NA>
## 4     1.161548e-05     6.753184e-07 HEART:8w_F1_M1  HEART ACETYL    <NA>
## 5     1.875400e-05     1.308419e-06 HEART:8w_F1_M1  HEART ACETYL    <NA>
## 6     5.014476e-05     4.081550e-06 HEART:8w_F1_M1  HEART ACETYL    <NA>
##    adj_p_value graphical_cluster
## 1 2.101458e-08          8w_F1_M1
## 2 8.894370e-09          8w_F1_M1
## 3 4.472744e-05          8w_F1_M1
## 4 1.054033e-04          8w_F1_M1
## 5 1.259332e-04          8w_F1_M1
## 6 3.424679e-04          8w_F1_M1
writeLines(colnames(GRAPH_PW_ENRICH))
## query
## significant
## term_size
## query_size
## intersection_size
## precision
## recall
## term_id
## source
## term_name
## effective_domain_size
## source_order
## parents
## evidence_codes
## intersection
## gost_adj_p_value
## computed_p_value
## cluster
## tissue
## ome
## kegg_id
## adj_p_value
## graphical_cluster

The cluster column in GRAPH_PW_ENRICH prepends the cluster name with the tissue abbreviation. The graphical_cluster column does not. Let’s get all significant heart and gastroc enrichment results for these three paths.

enrich_res = GRAPH_PW_ENRICH[GRAPH_PW_ENRICH$cluster %in% names(MY_CLUSTERS) & GRAPH_PW_ENRICH$adj_p_value < 0.1,]
table(enrich_res$cluster, enrich_res$ome)
##                                                        
##                                                         ACETYL ATAC METAB
##   HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1       2    2     0
##   HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1               7    0     1
##   HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1              16    0     2
##   SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1      0    0     0
##   SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1              0    0     0
##   SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1              0    2     0
##                                                        
##                                                         METHYL PHOSPHO PROT
##   HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1       7       0    2
##   HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1               0       9    3
##   HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1               0      29   30
##   SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1      0       0    1
##   SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1              0       0    1
##   SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1              0       0   28
##                                                        
##                                                         TRNSCRPT
##   HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1        20
##   HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1                20
##   HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1                11
##   SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1        5
##   SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1               10
##   SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1               15

Networks of PW enrichments

Okay, so now we know how pathway enrichment was performed. Great! Now let’s get to my favorite part - how to visualize large numbers of significant pathway enrichments.

Approach

After performing pathway enrichment for our graphical clusters of interest, we observed that many clusters had many dozens of significantly enriched pathways. In order to make these results more digestible, we leveraged visNetwork to construct interactive networks of pathway enrichments. These networks are conceptually similar to those made by the EnrichmentMap Cytoscape module, but it’s easier to generate them and explore the underlying data.

Briefly, each node in the network is a pathway, and edges connect highly similar pathways, i.e., those whose enrichments are driven by highly similar sets of genes across omes. This groups together highly similar pathway enrichments, which makes it easier to digest large numbers of results.

In our implementation, groups of similar pathway enrichments are color-coded, and group labels are inferred from the term names and parents of the pathways in the group. Larger nodes indicate that the pathway was significantly enriched in multiple datasets; thicker edges indicate higher similarity between the pathways.

Best of all, these networks are interactive! Use your cursor to zoom and drag, and hover over nodes and edges to see more information about the underlying data, namely the genes and datasets driving the pathway enrichment.

Note that you could run this function with any pathway enrichment results, as long as the columns of the input include “adj_p_value”, “ome”, “tissue”, “intersection”, “computed_p_value”, “term_name”, and “term_id” and the columns of feature_to_gene include “gene_symbol”, “ensembl_gene”. feature$ensembl_gene can be a dummy column if intersection_id_type = "gene_symbol".

Examples

Single tissue

Let’s focus on a single trajectory, first separately in each tissue. While it is possible to programatically render vizNetwork objects, it is not trivial, so let’s keep it simple here.

names(MY_CLUSTERS)
## [1] "SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"        
## [2] "HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"         
## [3] "SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1"
## [4] "HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1" 
## [5] "SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"        
## [6] "HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"
enrichment_network_vis(tissues = "SKM-GN", 
                       cluster = "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1", 
                       add_group_label_nodes = TRUE)
## Formatting inputs...
## Subsetting feature-to-gene map...
## Calculating similarity metric between pairs of enriched pathways...
## Constructing graph...
## Elapsed time: 15.391s
enrichment_network_vis(tissues = "HEART", 
                       cluster = "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1",
                       title = "HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1", 
                       add_group_label_nodes = TRUE)
## Formatting inputs...
## Subsetting feature-to-gene map...
## Calculating similarity metric between pairs of enriched pathways...
## Constructing graph...
## Elapsed time: 2.707s

Multiple tissues

Now let’s examine pathways that are enriched in the same graphical cluster in multiple tissues (heart and gastroc in our case). If we are specifically interested in pathways that are significantly enriched in multiple tissues, we can set multitissue_pathways_only to TRUE. This removes any pathways that are not enriched in more than one tissue (in at least one ome).

enrichment_network_vis(tissues = c("HEART","SKM-GN"),
                       cluster = "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1", 
                       multitissue_pathways_only = TRUE,
                       add_group_label_nodes = TRUE,
                       title = "'All up' path in heart and gastroc - shared enrichments")
## Formatting inputs...
## Subsetting feature-to-gene map...
## Calculating similarity metric between pairs of enriched pathways...
## Constructing graph...
## Elapsed time: 2.868s
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.4.0                   MotrpacRatTraining6mo_1.5.0    
## [3] MotrpacRatTraining6moData_1.8.0
## 
## loaded via a namespace (and not attached):
##  [1] RColorBrewer_1.1-3  numDeriv_2016.8-1.1 tools_4.2.2        
##  [4] bslib_0.4.2         utf8_1.2.2          R6_2.5.1           
##  [7] DBI_1.1.3           BiocGenerics_0.44.0 colorspace_2.0-3   
## [10] sn_2.1.0            withr_2.5.0         tidyselect_1.2.0   
## [13] gridExtra_2.3       mnormt_2.1.1        compiler_4.2.2     
## [16] cli_3.6.0           Biobase_2.58.0      TFisher_0.2.0      
## [19] sandwich_3.0-2      labeling_0.4.2      sass_0.4.4         
## [22] scales_1.2.1        mvtnorm_1.1-3       r2r_0.1.1          
## [25] stringr_1.5.0       digest_0.6.31       rmarkdown_2.19     
## [28] pkgconfig_2.0.3     htmltools_0.5.4     plotrix_3.8-2      
## [31] fastmap_1.1.0       limma_3.54.0        highr_0.10         
## [34] htmlwidgets_1.6.1   rlang_1.0.6         rstudioapi_0.14    
## [37] visNetwork_2.1.2    jquerylib_0.1.4     farver_2.1.1       
## [40] generics_0.1.3      zoo_1.8-11          jsonlite_1.8.4     
## [43] dplyr_1.0.10        magrittr_2.0.3      Matrix_1.5-3       
## [46] Rcpp_1.0.9          munsell_0.5.0       fansi_1.0.3        
## [49] viridis_0.6.2       lifecycle_1.0.3     stringi_1.7.12     
## [52] multcomp_1.4-20     yaml_2.3.6          edgeR_3.40.1       
## [55] ggraph_2.1.0        mathjaxr_1.6-0      MASS_7.3-58.1      
## [58] grid_4.2.2          ggrepel_0.9.2       lattice_0.20-45    
## [61] graphlayouts_0.8.4  splines_4.2.2       multtest_2.54.0    
## [64] locfit_1.5-9.7      qqconf_1.3.1        knitr_1.41         
## [67] pillar_1.8.1        igraph_1.3.5        codetools_0.2-18   
## [70] stats4_4.2.2        mutoss_0.1-12       glue_1.6.2         
## [73] evaluate_0.19       metap_1.8           data.table_1.14.6  
## [76] vctrs_0.5.1         tweenr_2.0.2        Rdpack_2.4         
## [79] gtable_0.3.1        purrr_1.0.1         polyclip_1.10-4    
## [82] tidyr_1.2.1         assertthat_0.2.1    cachem_1.0.6       
## [85] xfun_0.36           ggforce_0.4.1       rbibutils_2.2.13   
## [88] tidygraph_1.2.2     survival_3.5-0      viridisLite_0.4.1  
## [91] tibble_3.1.8        ellipsis_0.3.2      TH.data_1.1-1